USD PHP Prediction - Joshua's Conundrum Season 2 :)

Executive Summary (editing)

Prediction of USD/PHP forex trading in time of COVID19. Given the current situation, PHP is still a top performer in Asia which makes traders confused. Hence, this project will try to get some insights which might help traders not to get confused. The dataset is from Investing.com, where the timeframe is 10 years 2010 to 2020 (can be changed). The transactions is daily since there is no available hourly data (preferred).

cc: Arvin

I. Features Engineering

Indicators are tools that help an investor or a trader to make a decision whether to buy or sell. Technical indicators (which can be called features in this context) constructed from stock data. In this part we will create following features: Bollinger Bands, RSI, MACD, Moving Average, Return, Momentum, Change and Volatility. These technical indicators will be used since they are easy to compute given our available data.

Return will serve as a target or dependent variable. Other features will serve as independent variables.

Importing Libraries

In [1]:
import functions
import investpy
import matplotlib.patches as patches
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pandas_ta as ta
import plotting
import requests
import seaborn as sns
import statsmodels.api as sm
import warnings
from xgboost import XGBClassifier # for features importance

# ARIMA
from sklearn.metrics import mean_squared_error, confusion_matrix, f1_score, accuracy_score
from statsmodels.graphics.tsaplots import plot_pacf, plot_acf
from statsmodels.tsa.arima_model import ARIMA

# Tensorflow 2.0 including Keras
import tensorflow.keras as keras
# Hyper Parameters Tuning with Bayesian Optimization (> pip install bayesian-optimization)
from bayes_opt import BayesianOptimization
from tensorflow.keras.layers import Input, Flatten, TimeDistributed, LSTM, Dense, Bidirectional, Dropout, ConvLSTM2D, Conv1D, GlobalMaxPooling1D, MaxPooling1D, Convolution1D, BatchNormalization, LeakyReLU
from tensorflow.keras.models import Sequential, Model

warnings.filterwarnings('ignore')

Original Data

In [2]:
df = investpy.get_currency_cross_historical_data(currency_cross='USD/PHP',
                                                 from_date='01/08/2010',
                                                 to_date='01/08/2020')
df.columns = ['open', 'high', 'low', 'close', 'Currency']
df.head()
Out[2]:
open high low close Currency
Date
2010-08-02 45.515 45.570 45.290 45.400 PHP
2010-08-03 45.375 45.390 44.950 45.215 PHP
2010-08-04 45.050 45.245 44.985 45.190 PHP
2010-08-05 45.150 45.330 44.930 45.005 PHP
2010-08-06 45.035 45.110 44.750 44.835 PHP
In [3]:
df.describe()
Out[3]:
open high low close
count 2610.000000 2610.000000 2610.000000 2610.000000
mean 46.793904 46.937012 46.675649 46.801993
std 3.894147 3.898215 3.879868 3.887177
min 40.550000 40.620000 40.480000 40.550000
25% 43.500000 43.665000 43.400000 43.501250
50% 45.595000 45.795000 45.475000 45.680000
75% 50.630000 50.799500 50.530000 50.641500
max 54.372000 54.440000 54.225000 54.315000

Checking for missing data

In [4]:
print('No missing data') if sum(df.isna().sum()) == 0 else df.isna().sum()
No missing data

Generating Features

In [5]:
# Log Return Feature
df['Return'] = df.ta.log_return().fillna(0)

# Rate of Change Feature
df['Change'] = df.ta.roc().fillna(0)

# Relative Volatility Index Feature
df['Volatility'] = df.ta.rvi()['RVI_14_4'].fillna(0)

# Moving Average, 7 days
df['MA7'] = df.ta.sma(length=7).fillna(0)
# Moving Average, 20 days
df['MA20'] = df.ta.sma(length=20).fillna(0)

# Exponential Moving Average, 7 days
df['EMA7'] = df.ta.ema(length=7).fillna(0)
# Exponential Moving Average 20 days
df['EMA20'] = df.ta.ema(length=20).fillna(0)

# Momentum
df['Momentum'] = df.ta.mom().fillna(0)

# RSI (Relative Strength Index)
df['RSI'] = df.ta.rsi().fillna(0)

# MACD - (Moving Average Convergence/Divergence)
df['MACD'] = df.ta.macd()['MACD_12_26_9'].fillna(0)
df['Signal'] = df.ta.macd()['MACDS_12_26_9'].fillna(0)

# Upper Band and Lower Band for Bollinger Bands
df['Upper_band'] = df.ta.bbands()['BBU_5'].fillna(0)
df['Lower_band'] = df.ta.bbands()['BBL_5'].fillna(0)

# Parabolic SARS
df['PSAR'] = df.ta.psar()['PSARs_0.02_0.2'].fillna(0)

# Aroon Up and Down
df['Aroon_Up'] = df.ta.aroon()['AROONU_14'].fillna(0)
df['Aroon_Down'] = df.ta.aroon()['AROOND_14'].fillna(0)

df.dropna(inplace=True)

# Saving
df.to_csv('data/forex/USDPHPdaily.csv')

Exploratory Analysis and Machine learning algorithms will rely on features we generated at this point.

II. Exploratory Data Analysis

Generated features will help to understand dynamic properties of historical data.

Check for Correlation

In [6]:
df.corr()[['Return']].sort_values(by='Return', ascending=False)[:5]
Out[6]:
Return
Return 1.000000
RSI 0.337305
Change 0.325372
Momentum 0.324012
Aroon_Up 0.108432

Return correlated with Change and RSI.

In [7]:
plt.figure(figsize=(18,14))
sns.heatmap(df.corr(), annot=True, fmt='.2f')
plt.ylim(17, 0)
plt.title('Correlation Between USD/PHP Features', fontSize=15)
plt.show()

Bollinger Bands, RSI, MACD

In [8]:
plotting.bollinger_bands(df.loc['2019-8':'2020'])
In [9]:
plotting.rsi(df.loc['2019-8':'2020'])
In [10]:
plotting.macd(df.loc['2019-8':'2020'])

Return by Month

In [11]:
plt.figure(figsize=(14,5))
plt.style.use('seaborn-whitegrid')
for i in range(1,13):
    volatility = df[df.index.month==i].Return
    sns.distplot(volatility, hist=False, label=i)
    plt.legend(frameon=True, loc=1, ncol=3, fontsize=10, borderpad=.6, title='Months')
plt.axvline(df.Return.mean(), color='#666666', ls='--', lw=2)
plt.xticks(plt.xticks()[0] + df.Return.mean())
plt.title('USD/PHP Return by Month', fontSize=14)
plt.show()

Returns for Each Day of the Month

In [12]:
plotting.forex_returns(df, value='Return', by='day', scatter=False)

Returns for Each Month of the Year

In [13]:
plotting.forex_returns(df, value='close', by='month', scatter=False)

High-Low and Price

In [14]:
plt.figure(figsize=(16,6))

s = df.loc['2019-8':'2020']
u = s.high.ewm(7).mean()
l = s.low.ewm(7).mean()
plt.fill_between(s.index, u, l, color='#af43af', alpha=0.1, label='High / Low')
plt.plot(s.close, color='#aa43af', label='Price')
plt.plot(s.close.ewm(7).mean(), color='#ff43af', label='Moving Average (7 Days)')
plt.legend(frameon=True, loc=1, borderpad=.6)
plt.title('USD/PHP Close and High-Low', fontSize=15)
plt.show()

Checking for Normality

Machine learning algorithms, including Neural Networks heavily rely on probability in learning process. Let's check target variable Return to normality.

In [15]:
z = lambda x: (x - x.mean()) / x.std()

plt.hist(z(df.Return), bins=30)
plt.title('USD/PHP Return Distribution', fontSize=15)
plt.show()
In [16]:
plt.figure(figsize=(16,6))
sm.qqplot(df.Return, line='s', scale=1)
plt.rcParams['figure.figsize'] = [16.0, 6.0]
plt.title('Normality', fontSize=15)
plt.show()
<Figure size 3632x1362 with 0 Axes>

Insights

  • Return has 89.7842% correlation with Change feature.
  • Close value stays inside Bollinger Bands.
  • Return feature mostly normal.

III. Machine Learning

Modeling part is all about trying different models, tweaking hyperparameters, evaluation, finding creative ways to engineer features and so on.

Steps:

  1. Baseline Model
  2. ARIMA
  3. Features Selection with XGBoost
  4. Deep Neural Networks
    • LSTM Network
    • Convolutional Network
    • Bayesian Optimization

1. Baseline Model

Baseline model would serve as a benchmark for comparing to more complex models.

In [17]:
def baseline_model(forex):
    '''
    \n\n
    Input: Series or Array
    Returns: Accuracy Score
    Function generates random numbers [0,1] and compares them with true values
    \n\n
    '''
    baseline_predictions = np.random.randint(0, 2, len(forex))
    accuracy = accuracy_score(functions.binary(forex), baseline_predictions)
    return accuracy

Accuracy

In [18]:
baseline_accuracy = baseline_model(df.Return)
print('Baseline model accuracy: {:.1f}%'.format(baseline_accuracy * 100))
Baseline model accuracy: 50.7%

Accuracy Distribution

In [19]:
base_preds = []
for i in range(1000):
    base_preds.append(baseline_model(df.Return))
    
plt.figure(figsize=(16,6))
plt.style.use('seaborn-whitegrid')
plt.hist(base_preds, bins=50, facecolor='#4ac2fb')
plt.title('Baseline Model Accuracy', fontSize=15)
plt.axvline(np.array(base_preds).mean(), c='k', ls='--', lw=2)
plt.show()

Insight

Baseline model on average has 49.9% accuracy. We take this number as a guideline for our more complex models.

2. ARIMA

AutoRegressive Integrated Moving Average (ARIMA) is a model that captures a suite of different standard temporal structures in time series data.

  • p: The number of lag observations included in the model, also called the lag order.
  • d: The number of times that the raw observations are differenced, also called the degree of differencing.
  • q: The size of the moving average window, also called the order of moving average.

We will split train and test data to evaluate performance of ARIMA model.

In [20]:
print('USD/PHP historical data contains {} entries'.format(df.shape[0]))
df[['Return']].head()
USD/PHP historical data contains 2610 entries
Out[20]:
Return
Date
2010-08-02 0.000000
2010-08-03 -0.004083
2010-08-04 -0.000553
2010-08-05 -0.004102
2010-08-06 -0.003785

Autocorrelation

Let's take a look at the Autocorrelation Function below. The graph shows how time series data points correlate between each other. We should ignore first value in the graph that shows perfect correlation (value = 1), because it tells how data point is correlated to itself. What's important in this graph is how first data point is correlated to second, third and so on. We can see that it's so weak, it's close to zero. What does it mean to our analysis? It means that ARIMA is pretty much useless here, because it uses previous data points to predict following.

In [21]:
plt.rcParams['figure.figsize'] = (16, 3)
plot_acf(df.Return, lags=range(300))
plt.show()

To make a conclusion we're going to try different orders and see how well they perform on a given data.

In [22]:
# ARIMA orders
orders = [(0,0,0),(1,0,0),(0,1,0),(0,0,1),(1,1,0)]

# Splitting into train and test sets
train = list(df['Return'][:1900].values)
test = list(df['Return'][1900:].values)

all_predictions = {}

for order in orders:
    
    try:
        # History will contain original train set, 
        # but with each iteration we will add one datapoint
        # from the test set as we continue prediction
        history = train.copy()
        order_predictions = []
        
        for i in range(len(test)):
            
            model = ARIMA(history, order=order) # defining ARIMA model
            model_fit = model.fit(disp=0) # fitting model
            y_hat = model_fit.forecast() # predicting 'return'
            order_predictions.append(y_hat[0][0]) # first element ([0][0]) is a prediction
            history.append(test[i]) # simply adding following day 'return' value to the model    
            print('Prediction: {} of {}'.format(i+1,len(test)), end='\r')
        
        accuracy = accuracy_score( 
            functions.binary(test), 
            functions.binary(order_predictions) 
        )        
        print('                             ', end='\r')
        print('{} - {:.1f}% accuracy'.format(order, round(accuracy, 3)*100), end='\n')
        all_predictions[order] = order_predictions
    
    except:
        print(order, '<== Wrong Order', end='\n')
        pass
(0, 0, 0) - 50.1% accuracy   
(1, 0, 0) - 53.8% accuracy   
(0, 1, 0) - 44.5% accuracy   
(0, 0, 1) - 53.1% accuracy   
(1, 1, 0) - 46.3% accuracy   

Review Predictions

In [23]:
# Big Plot
fig = plt.figure(figsize=(16,4))
plt.plot(test, label='Test', color='#4ac2fb')
plt.plot(all_predictions[(1,1,0)], label='Predictions', color='#ff4e97')
plt.legend(frameon=True, loc=1, ncol=1, fontsize=10, borderpad=.6)
plt.title('Arima Predictions', fontSize=15)
plt.xlabel('Days', fontSize=13)
plt.ylabel('Returns', fontSize=13)

# Arrow
plt.annotate('',
             xy=(15, 0.05), 
             xytext=(150, .1), 
             fontsize=10, 
             arrowprops={'width':0.4,'headwidth':7,'color':'#333333'}
            )
# Patch
ax = fig.add_subplot(1, 1, 1)
rect = patches.Rectangle((0,-.05), 30, .1, ls='--', lw=2, facecolor='y', edgecolor='k', alpha=.5)
ax.add_patch(rect)

# Small Plot
plt.axes([.25, 1, .2, .5])
plt.plot(test[:30], color='#4ac2fb')
plt.plot(all_predictions[(0,1,0)][:30], color='#ff4e97')
plt.tick_params(axis='both', labelbottom=False, labelleft=False)
plt.title('Lag')
plt.show()

Histogram

In [24]:
plt.figure(figsize=(16,5))
plt.hist(df[1900:].reset_index().Return, bins=20, label='True', facecolor='#4ac2fb')
plt.hist(all_predictions[(1,1,0)], bins=20, label='Predicted', facecolor='#ff4e97', alpha=.7)
plt.axvline(0, c='k', ls='--')
plt.title('ARIMA True vs Predicted Values Distribution', fontSize=15)
plt.legend(frameon=True, loc=1, ncol=1, fontsize=10, borderpad=.6)
plt.show()

Interpreting Results

In [25]:
test_binary = functions.binary(df[1900:].reset_index().Return)
train_binary = functions.binary(all_predictions[(1,1,0)])
tn, fp, fn, tp = confusion_matrix(test_binary, train_binary).ravel()
accuracy = accuracy_score(test_binary, train_binary)

print("True Positive and Negative: {}".format((tp + tn)))
print("False Positive and Negative: {}".format((fp + fn)))
print("Accuracy: {:.1f}%".format(accuracy*100))
True Positive and Negative: 329
False Positive and Negative: 381
Accuracy: 46.3%

3. Features selection with XGBoost

XGBoost will be used here to extract important features that will be used for neural networks. This might help to improve model accuracy and boost training. Traning will be performed on scaled usdphp dataset.

In [26]:
df.drop('Currency', axis=1, inplace=True)
scaled_usdphp = functions.scale(df, scale=(0,1))
In [27]:
X = scaled_usdphp[:-1]
y = df.Return.shift(-1)[:-1]
In [28]:
# Initializing and fitting a model
xgb = XGBClassifier()
xgb.fit(X, y)
Out[28]:
XGBClassifier(base_score=0.5, booster=None, colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints=None,
              learning_rate=0.300000012, max_delta_step=0, max_depth=6,
              min_child_weight=1, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=0, num_parallel_tree=1,
              objective='multi:softprob', random_state=0, reg_alpha=0,
              reg_lambda=1, scale_pos_weight=None, subsample=1,
              tree_method=None, validate_parameters=False, verbosity=None)
In [29]:
important_features = pd.DataFrame({
                                    'Feature': X.columns, 
                                    'Importance': xgb.feature_importances_}) \
                                    .sort_values('Importance', ascending=True)

plt.figure(figsize=(16,8))
plt.style.use('seaborn-whitegrid')
plt.barh(important_features.Feature, important_features.Importance, color="#4ac2fb")
plt.title('XGboost - Feature Importance - USDPHP', fontSize=15)
plt.xlabel('Importance', fontSize=13)
plt.show()

4. Deep Neural Networks (will further optimize)

Preparing Data

In [30]:
n_steps = 21
scaled_usdphp = functions.scale(df, scale=(0,1))

X_train, \
y_train, \
X_test, \
y_test = functions.split_sequences(                        
    scaled_usdphp.to_numpy()[:-1], 
    df.Return.shift(-1).to_numpy()[:-1], 
    n_steps, 
    split=True, 
    ratio=0.8
)

4.1 LSTM Network

In [240]:
keras.backend.clear_session()

n_steps = X_train.shape[1]
n_features = X_train.shape[2]

model = Sequential()

model.add(LSTM(100, activation='relu', return_sequences=False,
               input_shape=(n_steps, n_features))) # 50%
# model.add(Dropout(0.5)) # {20: 58%, 30:36%, 40:42%, 50:46%}
# model.add(LSTM(50, activation='relu', return_sequences=False)) # 42%

# model.add(Dense(100)) # 42%
model.add(Dense(50)) # 64%
# model.add(Dropout(0.5)) # {20: 42%, 30: 58%, 40: 48%%, 50: 58%}
model.add(Dense(1)) # 50%

model.compile(optimizer='adam', loss='mse', metrics=['mae'])
In [241]:
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm (LSTM)                  (None, 100)               48400     
_________________________________________________________________
dense (Dense)                (None, 50)                5050      
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 51        
=================================================================
Total params: 53,501
Trainable params: 53,501
Non-trainable params: 0
_________________________________________________________________
In [242]:
# batch_size = {8: 42%, 16: 58%, 32: 60%, 64: 58%}
model.fit(X_train, y_train, epochs=100, verbose=0,
          validation_data=(X_test, y_test), use_multiprocessing=True)

plt.figure(figsize=(16,4))
plt.plot(model.history.history['loss'], label='Loss')
plt.plot(model.history.history['val_loss'], label='Val Loss')
plt.legend(loc=1)
plt.title('LSTM - Training Process')
plt.show()
In [243]:
pred, y_true, y_pred = functions.evaluation(
                    X_test, y_test, model, random=False, n_preds=50, 
                    show_graph=True)
MSE: 4.182873618717505e-06
Accuracy: 64%

Insight

Network has low MSE which is good since it means that most of the predicted daily returns are near the true daily returns. For the accuracy of finding the underlying trend of the daily returns, the network exceeded the baseline which is good. However, 64% is still not enough for trading large amount of money.

4.2 Convolutional Network

In [308]:
keras.backend.clear_session()

n_steps = X_train.shape[1]
n_features = X_train.shape[2]

model = Sequential()

model.add(Conv1D(filters=20, kernel_size=2, activation='relu',
                 input_shape=(n_steps, n_features))) # 48%
model.add(MaxPooling1D(pool_size=2))
# model.add(Dropout(0.5)) # {20: 42%, 30: 58%, 40: 42%, 50: 58%}
model.add(Conv1D(filters=10, kernel_size=2, activation='relu',
                 input_shape=(n_steps, n_features))) # 60%
model.add(MaxPooling1D(pool_size=2))
# model.add(Dropout(0.5)) # {20: 58%, 30: 58%, 40: 58%, 50: 46%}
# model.add(Conv1D(filters=5, kernel_size=2, activation='relu',
#                  input_shape=(n_steps, n_features))) # 58%
# model.add(MaxPooling1D(pool_size=2))

model.add(Flatten())

# model.add(Dense(50)) # 44%
model.add(Dense(1)) # 60%
model.compile(optimizer='adam', loss='mse', metrics=['mse'])
In [309]:
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv1d (Conv1D)              (None, 20, 20)            820       
_________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 10, 20)            0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 9, 10)             410       
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 4, 10)             0         
_________________________________________________________________
flatten (Flatten)            (None, 40)                0         
_________________________________________________________________
dense (Dense)                (None, 1)                 41        
=================================================================
Total params: 1,271
Trainable params: 1,271
Non-trainable params: 0
_________________________________________________________________
In [310]:
# batch_size = {8: 50%, 16: 44%, 32: 60%, 64: 50%}
model.fit(X_train, y_train, epochs=25, verbose=0,
          validation_data=(X_test, y_test), use_multiprocessing=True)

plt.figure(figsize=(16,4))
plt.plot(model.history.history['loss'], label='Loss')
plt.plot(model.history.history['val_loss'], label='Val Loss')
plt.legend(loc=1)
plt.title('Conv - Training Process')
plt.show()
In [311]:
pred, y_true, y_pred = functions.evaluation(
                    X_test, y_test, model, random=False, n_preds=50, 
                    show_graph=True)
MSE: 2.2158664084768313e-05
Accuracy: 60%

Insight

Convolutional Neural Network (CNN) has higher MSE compared to LSTM. Accuracy for finding underlying trend is also lower. Hence, it is much better to use LSTM when predicting time series data.

4.3 LSTM + Convolutional

In [440]:
keras.backend.clear_session()

n_steps = X_train.shape[1]
n_features = X_train.shape[2]

model = Sequential()

model.add(LSTM(100, activation='relu', return_sequences=True,
               input_shape=(n_steps, n_features))) # 42%
# model.add(Dropout(0.5)) # {20: 42%, 30: 58%, 40: 58%, 50: 58%}
model.add(LSTM(50, activation='relu', return_sequences=True)) # 58%
# model.add(Dropout(0.5)) # {20: 42%, 30: 42%, 40: 58%, 50: 58%}
# model.add(LSTM(25, activation='relu', return_sequences=True)) # 54%

model.add(Conv1D(filters=20, kernel_size=2, activation='relu')) # 58%
model.add(MaxPooling1D(pool_size=2))
# model.add(Dropout(0.5)) # {20: 58%, 30: 58%, 40: 42%, 50: 42%}
# model.add(Conv1D(filters=10, kernel_size=2, activation='relu')) # 44%
# model.add(MaxPooling1D(pool_size=2))

model.add(Flatten())

# model.add(Dense(100)) # 42%
model.add(Dense(50)) # 60%
# model.add(Dropout(0.5)) # {20: 42%, 30: 42%, 40: 42%, 50: 42%}
model.add(Dense(1)) # 58%

model.compile(optimizer='adam', loss='mse', metrics=['mae'])
In [441]:
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm (LSTM)                  (None, 21, 100)           48400     
_________________________________________________________________
lstm_1 (LSTM)                (None, 21, 50)            30200     
_________________________________________________________________
conv1d (Conv1D)              (None, 20, 20)            2020      
_________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 10, 20)            0         
_________________________________________________________________
flatten (Flatten)            (None, 200)               0         
_________________________________________________________________
dense (Dense)                (None, 50)                10050     
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 51        
=================================================================
Total params: 90,721
Trainable params: 90,721
Non-trainable params: 0
_________________________________________________________________
In [442]:
# batch_size = {8: 42%, 16: 58%, 32: 60%, 64: 42%}
model.fit(X_train, y_train, epochs=25, verbose=0,
          validation_data=(X_test, y_test), use_multiprocessing=True)

plt.figure(figsize=(16,4))
plt.plot(model.history.history['loss'], label='Loss')
plt.plot(model.history.history['val_loss'], label='Val Loss')
plt.legend(loc=1)
plt.title('LSTM+Conv - Training Process')
plt.show()
In [443]:
pred, y_true, y_pred = functions.evaluation(
                    X_test, y_test, model, random=False, n_preds=50, 
                    show_graph=True)
MSE: 4.765893720097423e-06
Accuracy: 60%

Insights

The LSTM+Convolutional Neural Network was able to beat the pure CNN model. However, it has higher and MSE and lower accuracy to pure LSTM model. It may need further optimization, hence we will use Bayesian Optimization on the LSTM+CNN model.

4.4 Bayesian Optimization

In [457]:
def create_model(u1, u2, d1, filters, pool, kernel):
    keras.backend.clear_session()
    
    u1 = int(u1)
    u2 = int(u2)
    d1 = int(d1)
    filters = int(filters)
    kernel = int(kernel)
    pool = int(pool)
    
    n_steps = X_train.shape[1]
    n_features = X_train.shape[2]
    model = Sequential()
    model.add(LSTM(u1, activation='relu', return_sequences=True,
                   input_shape=(n_steps, n_features)))
    model.add(LSTM(u2, activation='relu', return_sequences=True))
    model.add(Conv1D(filters=filters, kernel_size=kernel, activation='relu'))
    model.add(MaxPooling1D(pool_size=pool))
    model.add(Flatten())
    model.add(Dense(d1, activation='relu'))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse', metrics=['mse'])
    model.fit(X_train, y_train, epochs=4, verbose=0,
              validation_data=(X_test, y_test), use_multiprocessing=True)
    score = model.evaluate(X_test, y_test, verbose=0)
    return score[1]
In [463]:
def bayesian_optimization():    

    pbounds = {
        'u1': (100, 250),
        'u2': (25, 62),
        'filters': (1, 20), 
        'd1': (25, 62),  
        'kernel': (2,10), 
        'pool': (2, 10)
    }

    optimizer = BayesianOptimization(
        f = create_model,
        pbounds = pbounds,
        random_state = 1,
        verbose = 2
    )
    optimizer.maximize(init_points=5, n_iter=5)
    print(optimizer.max)
In [464]:
n_steps = 21
scaled_usdphp = functions.scale(df, scale=(0,1))

X_train, \
y_train, \
X_test, \
y_test = functions.split_sequences(                  
    scaled_usdphp.to_numpy()[:-1], 
    df.Return.shift(-1).to_numpy()[:-1], 
    n_steps, 
    split=True, 
    ratio=0.8
)
In [465]:
bayesian_optimization()
|   iter    |  target   |    d1     |  filters  |  kernel   |   pool    |    u1     |    u2     |
-------------------------------------------------------------------------------------------------
|  1        |  1.191e-0 |  40.43    |  14.69    |  2.001    |  4.419    |  122.0    |  28.42    |
|  2        |  1.067e-0 |  31.89    |  7.566    |  5.174    |  6.311    |  162.9    |  50.35    |
|  3        |  1.107e-0 |  32.56    |  17.68    |  2.219    |  7.364    |  162.6    |  45.67    |
|  4        |  1.058e-0 |  30.19    |  4.764    |  8.406    |  9.746    |  147.0    |  50.62    |
|  5        |  1.136e-0 |  57.43    |  18.0     |  2.68     |  2.312    |  125.5    |  57.49    |
|  6        |  1.284e-0 |  61.51    |  18.11    |  7.854    |  9.709    |  249.1    |  25.77    |
|  7        |  1.014e-0 |  25.65    |  1.661    |  7.725    |  3.639    |  249.4    |  54.34    |
|  8        |  1.171e-0 |  59.72    |  2.56     |  7.534    |  6.101    |  100.1    |  25.52    |
|  9        |  1.184e-0 |  58.93    |  13.75    |  3.36     |  7.906    |  249.1    |  59.4     |
|  10       |  1.015e-0 |  25.07    |  2.53     |  3.982    |  7.502    |  249.6    |  25.91    |
=================================================================================================
{'target': 1.2840469025832135e-05, 'params': {'d1': 61.512177156494175, 'filters': 18.114271368376368, 'kernel': 7.854340930414312, 'pool': 9.708959733245273, 'u1': 249.14644327271185, 'u2': 25.768051700378287}}
In [466]:
n_steps = X_train.shape[1]
n_features = X_train.shape[2]
model = Sequential()
model.add(LSTM(249, activation='relu', return_sequences=True,
               input_shape=(n_steps, n_features)))
model.add(LSTM(25, activation='relu', return_sequences=True))
model.add(Conv1D(filters=18, kernel_size=7, activation='relu'))
model.add(MaxPooling1D(pool_size=9))
model.add(Flatten())
model.add(Dense(61, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse', metrics=['mse'])
In [467]:
model.fit(X_train, y_train, epochs=100, verbose=0,
          validation_data=(X_test, y_test), use_multiprocessing=True)

plt.figure(figsize=(16,4))
plt.plot(model.history.history['loss'], label='Loss')
plt.plot(model.history.history['val_loss'], label='Val Loss')
plt.legend(loc=1)
plt.show()

# Evaluation
pred, y_true, y_pred = functions.evaluation(
                    X_test, y_test, model, random=True, n_preds=100, 
                    show_graph=True)
MSE: 9.404466394988422e-06
Accuracy: 55%

Insight

Bayesian optimization didn't improve the MSE and accuracy of the LSTM+CNN model. This may be due to the choice of parameters. For further optimzation, a wider range of parameters could be used to improve the performance of the model.

Conclusion (editing)

  • ARIMA model can't surpass baseline model accuracy due to the historical data stochastic behaviour.
  • Recurrent (LSTM) neural network has the best MSE and accuracy among the models created in this project.
  • CNN model beat the baseline, however, LSTM is still much better.
  • LSTM+CNN has almost the same MSE with the pure LSTM model, however, its prediction in the underlying trends is not much better than the pure LSTM model.
  • Bayesian optimization didn't improve much accuracy of LSTM+CNN model.
  • The test set used in this project contains the time frame during the COVID-19 pandemice, hence it may have affected the results.
  • COVID-19 sucks.

Future Work (editing)

  • Incorporate fundamental analysis with historical data
  • Add recommendations from trading platforms
  • Expand analysis of forex to different exchange rates
In [ ]: